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Diatoms are highly successful marine and freshwater algae that contribute up to 20% of global carbon fixation. These 
species are leading candidates for biofuel production owing to ease of culturing and high fatty acid content. To assist 
in strain improvement and downstream applications for potential use as a biofuel, it is important to understand the 
evolution of lipid biosynthesis in diatoms. The evolutionary history of diatoms is however complicated by likely multiple 
endosymbioses involving the capture of foreign cells and horizontal gene transfer into the host genome. Using a phy- 
logenomic approach, we assessed the evolutionary history of 12 diatom genes putatively encoding functions related 
to lipid biosynthesis. We found evidence of gene transfer likely from a green algal source for seven of these genes, with 
the remaining showing either vertical inheritance or evolutionary histories too complicated to interpret given current 
genome data. The functions of horizontally transferred genes encompass all aspects of lipid biosynthesis (initiation, 
biosynthesis, and desaturation of fatty acids) as well as fatty acid elongation, and are not restricted to plastid-targeted 
proteins. Our findings demonstrate that the transfer, duplication, and subfunctionalization of genes were key steps in the 
evolution of lipid biosynthesis in diatoms and other photosynthetic eukaryotes. This target pathway for biofuel research 
is highly chimeric and surprisingly, our results suggest that research done on related genes in green algae may have 
application to diatom models. 



Introduction 

Diatoms are one of the most common phytoplankton in 
aquatic environments, with an estimated diversity of 100000 
species. 1 These free-living, unicellular primary producers pro- 
vide oxygen via photosynthesis and are crucial for regulating 
the biogeochemical cycle of silicon. 2 The hard siliceous struc- 
ture (frustule) surrounding diatoms has been utilized in bio- 
nanotechnology applications 3 as a cross-linking agent for active 
biomolecules; e.g., in immunoprecipitation. 4 In recent years, 
diatoms have also been targeted for biofuel production owing 
to the ease of mass culturing and their high fat content. 5 Given 
their ecological and potential economic value, it is important to 
understand the evolution of fatty acid biosynthesis in diatoms. 
These data may also assist in experimental and industrial designs 
of downstream biofuel applications relying on these organisms 
or the products that their genes encode. 

Fatty acids are used in many essential cellular processes, from 
energy production (e.g., triglycerides), 6 to the synthesis of mem- 
brane (e.g., phospholipid) 7 and hormones. 8 Fatty acid biosyn- 
thesis (FAB) occurs in all living organisms, with the exception 



of some parasites that have highly reduced genomes.' There are 
two main types of FAB systems: Type I that commonly utilize a 
single multifunctional protein complex and Type II that involves 
multiple monofunctional enzymes. Both of these FAB systems 
are present in prokaryotes and plastid-lacking eukaryotes. In 
animals and fungi, the Type I system important for palmitate 
synthesis is cytosolic, whereas Type II is involved in the produc- 
tion of eight-carbon chains within the mitochondrial matrix. 1011 
In comparison, photosynthetic eukaryotes, e.g., plants, algae and 
the polyphyletic group of photosynthetic protists often referred 
to as "chromalveolates" (including diatoms) use only Type II 
FAB 12 specifically within the plastid (Fig. 1). 

The origin and evolution of the plastid is explained by endo- 
symbiosis, during which genes are transferred from the engulfed 
endosymbiont to the host genome. 13,14 Secondary endosymbio- 
sis that involves the engulfment of an existing (in this case, a 
red) alga is thought to be the landmark event that gave rise to 
photosynthesis in "chromalveolates" (e.g., diatoms, dinoflagel- 
lates, cryptophytes, haptophytes), and the complex plastid struc- 
ture (e.g., 3—4 bounding membranes, remnant nucleomorph in 
cryptophytes) in these photosynthetic lineages. 14 Nevertheless, a 
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of horizontally transferred genes from various pro- 
karyote and eukaryote lineages. 17,25 " 28 Therefore, the 
evolutionary history of FAB in diatoms is expected 
to be complex with potentially multiple different 
genetic inputs including red and/or green algal 
donors, as a result of endosymbiotic gene trans- 
fer (EGT) as well as HGT from these and other 
sources. Here we applied a phylogenetic approach 
to examine the evolutionary history of a representa- 
tive set of genes in diatoms that are implicated in 
the plastid FAB system to assess the impact of E/ 
HGT on FAB evolution in diatoms and microbial 
eukaryotes in general. 

Results and Discussion 



Figure 1. Simplified illustration of Type II fatty acid biosynthesis (FAB) in photosynthetic 
organisms. The synthesis of fatty acids takes place within the plastid. (A) The initial steps 
involve the synthesis of malonyl-CoA by the enzyme acetyl-CoA carboxylase (ACCase). 
(B) Malonyl-CoA is converted into malonyl-ACP by enzyme FabD, following which a 
series of Fab enzymes are engaged in the synthesis of saturated fatty acid chains. (C) The 
production of unsaturated fatty acids is catalyzed by a number of fatty acid desaturases 
(FADs). 



phylogenomic study 15 demonstrated that a substantial number 
of genes in diatoms have arisen from green algal sources. This 
surprising result may be explained by an additional (cryptic) 
endosymbiosis in the ancestor of diatoms and other "chromal- 
veolates." Under this (still controversial) scenario, the plastid of 
the captured green alga was lost but endosymbiont genes that 
were transferred to the "chromalveolate" nucleus remain as 
"footprints" of this association. Alternatively, all "green genes" 
that have been discovered in diatoms and in other taxa such 
as the stramenopile Ectocarpus siliculosus, 16 the dinoflagellate 
Alexandrium tamarense? 7 the cryptophyte Bigelowiella nutans and 
the chlorarachniophyte Guillardia tbeta, ,s and the haptophyte 
Emiliania huxleyi, 15,19 may be the by-product of dozens of inde- 
pendent horizontal gene transfer (HGT) events. 20,21 Although 
proving unambiguously either of these hypotheses is challenging 
with current data, the cryptic endosymbiosis hypotheses provides 
a testable prediction: there should be a variety of plastid- targeted 
proteins encoded by green genes, suggesting these pre-existing 
functions associated with a green plastid have been co-opted to 
expand the metabolic capacity of the red algal-derived organelle. 
An example of this is provided by the existence of prasinophyte 
(green algal) -derived genes for the photo-protective xanthophyll 
cycle in "chromalveolates" that provides high photosynthetic 
efficiency under fluctuating light conditions. 22 

Beyond plastid functions, a recent analysis shows a mosaic 
(red and or green algal) origin of membrane transporters in dia- 
toms with the majority of genes putatively derived from green 
sources. 23,24 These results demonstrate more broadly the role of 
genetic transfer as a driver of environmental adaption and cell 
evolution. In addition, the genomes of microbial eukaryotes, 
particularly "chromalveolates," appear to host a large number 



We identified 12 genes in diatoms that have 
putative functions implicated across all three major 
phases in FAB: (a) the initiation of FAB, (b) the 
synthesis of fatty acid chains, and (c) the desatura- 
tion of fatty acid chains, as well as fatty acid elon- 
gation. Table 1 shows the list of these genes with 
their putative origins and presence/absence of pro- 
tein-targeting signals to the plastid. The phylogeny 
of each of these gene families is shown in Figure 
SI, following the order in Table 1. Interestingly 
using our approach, seven of the 12 proteins have a putative 
green algal origin, and the remainder shows convoluted evolu- 
tionary histories that are too difficult to decipher with available 
data. We found evidence of a plastid-targeting signal in most 
of the enzymes encoded by these nuclear encoded FAB -related 
genes, consistent with the hypothesis that de novo FAB occurs in 
the plastid in photosynthetic eukaryotes. 
Initiation of fatty acid biosynthesis 

One of the most striking examples of green algal derived genes 
is the Acc gene that encodes acetyl-CoA carboxylase (ACCase, 
EC 6.4.1.2), an enzyme critical for the first dedicated step of 
Type II FAB. Two of the key intermediate molecules in FAB 
are acetyl-CoA and malonyl-CoA. Malonyl-CoA is required for 
the production of the backbone structure of fatty acid chains. 29 
The enzyme ACCase is involved in the carboxylation of acetyl- 
CoA, yielding malonyl-CoA. The inhibition of ACCase produc- 
tion leads to cell death, making the enzyme a useful target for 
commercial herbicides in plants. 30 Expression of the Acc gene has 
been reported to regulate fatty acid composition in plant seeds. 31 

ACCase consists of four protein subunits: biotin carboxylase, 
biotin carboxyl carrier protein, and two subunits of carboxyl- 
transferase. 32 In most plastid-bearing organisms, two types of 
ACCase exist in the cells: (a) a plastidic, heteromeric ACCase 
that contains two different subunits of carboxyltransferase 
(a and (3), and (b) a cytosolic, homomeric ACCase that con- 
tains two identical carboxyltransferase subunits. Plastidic het- 
eromeric ACCase is essential for the synthesis of fatty acids in 
the majority of photosynthetic organisms, 33 except in the grass 
family in which it is lacking. 34 In contrast, homomeric ACCase 
that is exclusively cytosolic in most photosynthetic organisms is 
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Table 1. List of FAB-related genes in diatoms that are used in this study 



Gene 


Encoded protein or putative function 


Query (Gl) 


EC number 


Plastid-targeting signal 


Acc* 


Acetyl-CoA carboxylase 


224004864 


6.4.1.2 


Yes 


ACS* 


Acyl-CoA synthethase 


224003657 


2.3.1.86 


No 


fabD 


Malonyl-CoA:ACP transacylase 


224001858 


2.3.1.39 


Yes 


fabG* 


3-ketoacyl-ACP reductase 


224005350 


1.1.1.100 


No 


fabH 


(3-ketoacyl-ACP synthase III 


224013337 


2.3.1.41 


Yes 


fad3* 


Omega-3 fatty acid desaturase 


224002771 


1.14.99.- 


Yes 


fad? 


Fatty acid desaturase (predicted) 


224014800 


1.14.99.- 


No 


Delta-6 FAD-like* 


Delta-6 FAD-like protein 


223999591 


1.14.99.- 


Yes 


fadll* 


Delta-1 1 palmitoyl CoA desaturase 


224000772 


1.14.19.5 


No 


EL01 


Polyunsaturated fatty acid elongase 1 


75108642 


1.14.99.- 


No 


EL02* 


Polyunsaturated fatty acid elongase 2 


224005955 


1.14.99.- 


No 


EL03 


Polyunsaturated fatty acid elongase 3 


220970795 


1.14.99.- 


No 



For each gene, the corresponding encoded/predicted protein from Thalassiosira pseudonana is used as query (GenBank Gl number shown). The corre- 
sponding Enzyme Commission number and prediction of plastid-targeting signal for each of the encoded proteins is shown. Genes marked with an asterisk 
(*) show evidence of putative green algal origin based on our phylogenetic analysis. 



important for the synthesis of a number of other metabolites; 
e.g., flavonoids, anthocyanins, malonated amino acids, and eth- 
ylene precursors. 33 

Figure 2A shows the phylogeny of the diatom ACCase protein 
family (complete tree shown in Fig. S1A). For the FAB-related 
plastid-targeted ACCase, we observe monophyly (bootstrap sup- 
port 86%) of diatoms {Phaeodactylum tricornutum, Thalassiosira 
pseudonana, and Fragilariopsis cylindrus) and prasinophyte green 
algae {Ostreococcus and Micromonas) , in the presence of other 
algal lineages such as the red alga Porphyridium purpureum. This 
phylogeny is consistent with (i.e., does not prove) an algal origin 
of plastid-targeted ACCase in diatoms with the prasinophytes 
being a putative source. The absence of bootstrap support for the 
order of divergence within this clade however makes it impos- 
sible to infer the direction of gene transfer between red/green 
algae and "chromalveolates." Interestingly, we find no evidence 
of HGT in the non-plastid-targeted ACCase. These proteins 
in diatoms form a separate group with the other stramenopiles 
(including Phytophthora and Ectocarpus) external to the clade 
containing the plastid-targeted isoform. The distinct evolution- 
ary history of these two Acc isoforms suggests that the genes were 
independently acquired in diatoms, other "chromalveolates," and 
in the green-plastid containing rhizarians Bigelowiella natans 
and Euglena gracilis, and that this gene implicated in FAB has a 
putative green algal origin in all of these taxa. 

The origin of the Type II FAB system in diatoms (and shared 
by other "chromalveolates") can be traced back to the origin 
of plastid itself. The plastid in "chromalveolates" is postulated 
to have arisen from an endosymbiotic relationship with a red 
alga, 1314 and as proposed more recently, also potentially via 
an earlier cryptic endosymbiosis with a green alga, in particu- 
lar, a prasinophyte. 15 Focusing on the prasinophyte lineages in 
a tree that excludes other members in the clade within which 
plastid-targeted diatom ACCases are found, and the highly 



diverged prokaryotic outgroups (Fig. 2B), we find all prasino- 
phyte Acc genes to be divided into two subgroups with each of 
them robustly supported (bootstrap 100%). This suggests that 
the Acc gene underwent duplication in the common ancestor 
of Micromonas and Ostreococcus (and potentially other prasino- 
phytes). The lack of support for monophyly of the two paralogs 
is likely explained by sequence divergence that occurred after 
the duplication event, because there is no reason to believe that 
the prasinophyte host lineage is polyphyletic among eukary- 
otes. Under this scenario, plastidic ACCase in "chromalveo- 
lates" arose via the transfer (e.g., EGT) of the Acc gene that 
had undergone subfunctionalization in prasinophytes and was 
tailored specifically for Type II FAB within the plastid. In con- 
trast, the gene copy encoding cytosolic ACCase within plastid- 
bearing organisms appears to have been vertically inherited in 
"chromalveolates." 

Synthesis of fatty acid chains 

In the Type II system, the synthesis of fatty acid chains is a 
sequential repetitive process involving a series of enzymes, using 
acyl carrier protein (ACP) as a carrier molecule and the malo- 
nyl side chain as donor to sequentially add two-carbon units to 
the chain. 35,36 Multiple pathways are involved in the initiation of 
FAB 36 and the one relevant to this study is shown in Figure 1. 
Figure SID and S1E show the phylogenies of two enzymes in 
diatoms that are involved in this process, respectively, for 3-keto- 
acyl-ACP reductase (FabG, EC 1.1.1.100) and (3-ketoacyl-ACP 
synthase III (FabH, EC 2.3.1.41). FabH is the key enzyme in 
the synthesis of (3-ketoacyl-ACP, which, in an intermediate step 
in FAB, is then converted into |3-hydoxyacyl-ACP by FabG, 36 
as shown in Figure 1. We found no clear evidence of HGT in 
the evolutionary history of FabH, indicating vertical inheritance 
(Fig. S1E), in which the diatom genes were within a strongly 
supported clade with the other stramenopiles, rhizarians, and the 
haptophyte Emiliania huxleyi (the "chromalveolates"; bootstrap 
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Figure 2. Phylogenies of Acc gene families in diatoms, showing (A) for all lineages and (B) with focus on the prasinophyte lineages. Only bootstrap sup- 
port >50% is shown on the internal branches. The diatoms and prasinophytes are highlighted in bold and each plastid-targeted protein is marked with 
a dot. The full tree is shown in Figure SI A. 



76%), with substantial support of a sister lineage of alveolates, 
and a strongly supported Viridiplantae clade (bootstrap 80%) 
elsewhere on the tree. Interestingly, the FabG proteins in dia- 
toms (together with other stramenopiles and Rhizaria) are 
clustered within a strongly supported monophyletic group (boot- 
strap 86%) with prasinophytes (Fig. SID). In the absence of 
red algal and other green algal homologs in this tree, our results 



suggest that FabG in diatoms shares an affiliation with prasino- 
phytes, an observation that is plausibly explained by HGT or less 
likely, by gene loss events in all other lineages. The third impor- 
tant enzyme in this group is the malonyl-CoA:ACP transacylase 
(FabD, EC 2.3.1.39) that converts malonyl-CoA into malonyl- 
ACP. 36,37 This gene, while showing likely vertical inheritance in 
diatoms (Fig. SIC) occurs as non-plastid-targeted proteins in 
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prasinophytes that are clustered in a well-supported monophy- 
letic relationship (85%) with the dinoflagellate Alexandrium 
tamarense (and the choanoflagellates Monosiga), suggesting a 
putative E/HGT association between the green algal and the 
alveolate lineages. The prasinophytes, which represent a basal 
lineage of green algae, may be sources of these genes, although 
this aspect remains to be validated. 
Desaturation of fatty acid chains 

Unsaturated fatty acids are pivotal components in cell mem- 
branes and therefore crucial for cell survival. In photosynthetic 
organisms, a variety of fatty acid desaturases (FADs) are pres- 
ent in their genomes. FADs are highly labile, 38 many of which 
have stringent specificity; i.e., location on the carbon chain 
(regiospecificity) and select substrates. 3 '- 1 At the molecular level, 
an amino acid difference of as few as five residues can change 
the regiospecificity of the enzymatic reaction of the protein. 
Previous work has demonstrated that different sets of FADs can 
operate in different pathways and subcellular compartments, 39,40 
therefore enzymes with the same or very similar functions within 
the diatoms (or any organism) can have different evolutionary 
histories. 41 Therefore, even using our stringent phylogenomic 
approach, some of the highly similar proteins (e.g., of different 
groups of FADs) could be included in a phylogenetic tree. 

As shown in Figure S1F, we found evidence of red and/ 
or green algal origin of the diatom fad3 gene, which encodes 
omega-3 FAD, an important enzyme that converts linoleic 
(18:2) into linolenic (18:3) acids, as shown in two separate 
monophyletic clades at bootstrap support respectively, of 99% 
and 93%. Upon closer inspection, the proteins encoded in 
T. pseudonana that show a putative green algal origin are also 
annotated as hypothetical proteins with putative omega-6 FAD 
function, as encoded by the fad6 gene. The phylogenetic tree of 
another predicted fatty acid desaturase in diatoms (Fig. S1G) 
shows no clear evidence of HGT. This finding suggests diver- 
gence of protein functions based on acquired genetic material 
from different sources, but this aspect remains to be validated 
as more genome data and better annotations become avail- 
able. Figure S1H shows the phylogeny of the delta-6 FAD-like 
gene family (also known as the FADS2). The delta-6 FAD-like 
domain (GenBank accession CD03506) includes integral mem- 
brane enzymes of both delta-6 and delta-8. We find within this 
gene family an association (bootstrap 69%), although not as 
strong as the commonly accepted threshold of >70%, between 
prasinophytes and "chromalveolate" (diatoms and haptophytes) 
lineages, with other diatom and green algal copies elsewhere in 
the tree. Our findings highlight the complicated issues associ- 
ated with inferring phylogenetic trees from the divergent protein 
families of FADs. 

There are two distinct types of FADs: (a) ACP (soluble) 
desaturases found only in plants and certain bacteria, and (b) 
membrane-bound (insoluble) desaturases found in most aerobic 
organisms including bacteria, fungi, plants, and animals. 42 In 
plants, ACP desaturases are associated with the plastid, whereas 
membrane-bound desaturases are found in both the plastid and 
the endoplasmic reticulum. 43 It is intriguing that ACP desatu- 
rases are not found in extant cyanobacteria (Fig. SID). Because 



cyanobacteria gave rise to plastids, ACP desaturases in plastid- 
bearing organisms can be explained by HGT from non-cya- 
nobacterial lineages into ancestral algal lineages or gene loss 
within cyanobacteria after establishment of the photosynthetic 
eukaryotes. 42 Here we report a number of other green algal 
derived genes in diatoms that are involved in fatty acid biosyn- 
thesis, desaturation, and elongation, including fadll, encoding 
FAD11 that desaturates palmitic acids (Fig. S1I), ACS encoding 
acyl-CoA synthethase (Fig. SIB), and EL02 (Fig. S1K) encod- 
ing membrane-bound polyunsaturated fatty acid elongase (the 
ELO superfamily). The other two diatom genes within the ELO 
superfamily, ELOl (Fig. S1J) and EL03 (Fig. S1L), show possi- 
ble vertical inheritance or an evolutionary history that currently 
precludes an easy explanation. 

The "fat revolution" in diatoms 

Our findings demonstrate that HGT, duplication, and sub- 
functionalization of genes are key evolutionary processes in 
the evolution of the Type II FAB system within photosynthetic 
eukaryotes. In diatoms, and in potentially most, if not all stra- 
menopiles, some of the key genes involved in Type II FAB sys- 
tem and desaturation of fatty acids trace their origin to green 
algal (likely prasinophyte) sources. The finding that components 
of key plastid functions such as FAB and photo-protection 22 in 
diatoms and other "chromalveolates" are prasinophyte-derived is 
consistent with (but does not prove) the cryptic endosymbiosis 
hypothesis. 15 It is also conceivable that prasinophytes provided a 
rich and easily accessible source of genes for ancestral "chromal- 
veolates" lineage (s) and underwent massive levels of HGT into 
these chlorophyll c-containing taxa. This resulted in a highly 
chimeric red plastid proteome and "chromalveolates" nuclear 
genome. Additional sources of evidence are needed to test these 
ideas. 

It should be noted that by relying on phylogenomics we make 
the implicit assumption that genes are transferred as a whole 
during an E/HGT event. The modularity of HGT, 44 degree of 
conservation within the gene family, and genome rearrangement 
following HGT could affect the delineation of HGT history 
using phylogenetic comparisons; in the latter cases, alternative 
phylogenomic approach might be useful. 45 Furthermore, evolu- 
tionary analysis of FAD is complicated by the duplicated nature 
of these genes and the retention of high sequence similarity 
among homologs with different regiospecificities, which is the 
basis for gene annotation. Comparative biochemical validation of 
FAB pathways among photosynthetic "chromalveolate" and algal 
species will be a useful test of our results. Assessing functional 
biases among the horizontally transferred genes into the diatom 
genomes; e.g., whether genes implicated in the FAB system are 
more likely to have been transferred than those genes involved 
in other metabolic or cellular processes, can provide invalu- 
able insights into how FAB evolved in diatoms. Nevertheless, 
given that FAB is a process crucial to the survival of organisms 
(in addition to photosynthesis in algae and plants), our results 
clearly demonstrate that endosymbiosis plays a more significant 
role in genome evolution and innovation of "chromalveolates" 
than previously thought. This includes the pathways that hold 
promise for providing biofuels in the near future. 
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Materials and Methods 

Data 

Twelve protein sequences with functions implicated in 
FAB, as predicted from the genome of Thalassiosira pseudonana 
CMP1335 46 (Table 1), were used to query genome data. The 
resulting alignments were used for phylogenetic analysis. 

Phylogenetic analysis 

We applied a phylogenetic approach adopted from Chan 
et al. 47 for inferring E/HGT events. For each of the protein 
sequences, we searched for putative homologs within a local 
database similar to the one used a earlier study 23 but with the 
updated NCBI RefSeq release 51 (http://www.ncbi.nlm.nih. 
gov/RefSeq). This database, comprising ca. 17 million protein 
sequences, include other genomic sources of predicted proteins 
(http://www.jgi.doe.gov/) and EST data (http://www.ncbi. 
nlm.nih.gov/dbEST/; http://tbestdb.bcm.umontreal.ca/) from 
other algae and protists. Other published transcriptome (or 
genome, where available) data from red algae Cyanidioschyzon 
merolae, 4 * Porphyridium purpureumf Calliarthron tubercu- 
losum, 47 Chondrus crispus, 50 and Galdieria sulphuraria,^ the 
stramenopile Ectocarpus siliculosus, 16 and the dinoflagellate 
Alexandrium tamarense n were also included in the database. 
Homologous protein families were aligned using MUSCLE 52 



at default settings, and phylogenies were reconstructed using 
RAxML 53 using the WAG 54 model of amino acid substitu- 
tion. E/HGT in diatoms from other algae were inferred when 
strongly supported monophyly (bootstrap value >75%) was 
observed for a sister group relationship between diatoms (with 
or without other "chromalveolates") and lineages of green and/ 
or red algae. 

Prediction of protein subcellular targets 

Subcellular targets of all stramenopile proteins (including 
those of the diatoms) were determined using HECTAR (http:// 
www.sb-roscoff.fr/hectar/). 55 
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